PAPER # 7032 Rightsizing Your Mainframe Performance Criteria by James A Hepler Hewlett-Packard 39550 Orchard Hill Place Novi, Michigan, 48376-8024 313-380-2207 INTRODUCTION Rightsizing, Downsizing, or MainFrame Alternative Solutions provide a major opportunity for cost savings while often improving service to the user community. This paper will discuss the predominant issues involved with decision making when approaching a MainFrame Alternative (MFA) situation concentrating on those issues relating to performance. While not all of the issues listed may be directly related to performance, there may be an indirect relationship or an impact on Service Levels that should be considered as part of the overall solution. MFA covers a broad spectrum of issues. A paper of this type cannot begin to address all of the possible situations or obstacles that may arise. The intent here is to focus at a high level on areas that the analyst will generally have to review in order to size a system, choose a data base, pick a platform, select an application, et cetera as part of the planned main frame alternative solution. Every one of these areas would make excellent topics for other papers allowing more technical detail and more case studies, but a global view is also needed. This paper presents that view from the top. MAINFRAME ALTERNATIVE DRIVING FORCES "74% of mainframe users are either investigating, currently migrating from, or have completely migrated from mainframes." Dataquest 1991 What are those mysterious forces driving computer industry users away from the traditional mainframe environment? There are many factors named in a variety of ways, but they can all be boiled down to a small number. If those are examined closely, you can eventually find a cost benefit. The differentiator is how soon and how easy to measure is the cost benefit? For the sake of simplicity, let us think of cost savings as short term savings or at least a short term return on investment. Because of technological advances in computer hardware and software, there is an obvious and up front savings in operating costs when a movement from the mainframe environment to a state of the art system and operating environment is completed. But does the cost of migration or conversion to a new environment and the risk to the corporation justify the investment? In about 50% of the cases I have been involved with, this initial cost saving exceeded the conversion or migration investment. A second driving force is the strategic decision to move to a more Open System environment to take advantage of all the opportunities offered. This is very easy to justify with cost savings as well, but the investment may be higher especially when a move to new Client/Server application technology is to be accomplished at the same time. The return on Information Systems (IS) investment may be farther into the future but advantages to the end users, application maintenance, and operations could be used to accelerate the financial payoff. This movement to Open Systems is fueled by requirements for features like RAD (Rapid Application Development) and CASE (Computer Aided Software Engineering) tools. Easier access to data for end users and decision support, reduced application maintenance, better networking capabilities, vendor independence, desk-top integration tool sets, better data integrity features, and many other considerations are important facets of the Open System environment. These types of Information Technology (IT) justifications are based on gaining competitive advantages such as faster time getting new products to market, better product offerings at less cost, more flexible customer service, and so on. If a company can access their data faster, more intuitively, and with greater flexibility, they can make decisions faster and provide better service to their internal and external customers. This will allow the company to respond more quickly to changing business needs. Open Systems offers more portability and enabling software so that as technological advances and standards emerge in the future they can be easily implemented. The obstacles experienced moving from the mainframe environment now will be greatly reduced when changes are needed in the future. If the Open System is RISC (Reduced Instruction Set Computing) based, the scalability advantages of RISC will be a large bonus to the user community. Companies are also examining main frame alternatives as part of a reduction in vendor risk and cost. Many computer hardware and software vendors in the mainframe world are reducing their level of support while increasing the support costs and licensing fees to the customer. Many customers are more concerned about support reduction than increasing costs but it is a problem they cannot ignore. Third party software licensing fees also tend to be larger on mainframes than on Open Platforms. There is also concern about the future direction of the proprietary product offerings of some vendors by many customers considering an alternative. POSIX compliance is being offered by some vendors as a way of opening their proprietary operating environments but this does not seem to be happening in the traditional mainframe world. A new trend seen recently is that new college graduates do not want to work in IS departments with tools they consider antiquated or with 3rd generation languages. They prefer to work with new Graphical User Interfaces (GUIs) like MOTIF and Windows and 4th generation languages and CASE tools. A IS staff with a need for entry level programmers and programmer-analysts will need to be concerned with this at an escalating rate. Perhaps even more critical is the morale problems with existing staff members feeling they are falling behind in technical expertise because of the age of their systems architecture and supporting tools. There is a growing faction of systems personnel who feel that character mode dumb black and white terminals and even PCs with 286 or older chip sets are anchors holding them down. They feel these devices would be more useful at the bottom of a lake where a boat anchor should be. These people may console themselves with the fact that there are still systems out there with punch cards as their primary source of data entry. These issues are creating the interest in mainframe alternatives. The benefits can be enormous. There are however many concerns to be addressed when considering a move to a new and lesser known environment. An important criterion in studying the mainframe alternative is to offer the same or better functionality to the users and IS staff while maintaining the same performance service. In many cases, customers will accept a slight reduction in services if there are substantial other benefits or cost savings. For example, on their previous mainframe system, a customer was getting an average response time of 1.7 seconds. After porting the same application to a less expensive open system the average response time was measured at 1.95 seconds. While it was measurably slower, there was no perception of degraded performance by the users. The system resources were not being taxed and a performance analysis showed that response time would remain at less than 2.0 seconds with 30% increase in usage. This was well within acceptable limits to this customer. FACTORS AFFECTING PERFORMANCE There are many factors that can have an impact on system or server performance. There are other factors that can have an effect on performance in a distributed or client/server environment. These can be categorized in five basic categories as follows CPU speed and utilization Disk I/O rates and demand Memory access rates and utilization Network delays Software locks and latches The traditional factors of CPU, Disk I/O and Memory Utilization are familiar to most people involved with system performance. The basic concept for these factors is that they are resources required to do work on a computer system and that the amount required of each resource and the time it takes to acquire and use that resource effect response time or throughput. There have been many presentations and papers at Interex covering those topics. The network delay factor includes not only the speed of the network and queuing within the network, but also delays due to the work performed either on the remote server or on the client system if one is involved. Stated another way, for a transaction on system A to complete, it may have to wait for part of the transaction to complete on system B. For the local server, this appears to be part of network delay because it takes place on the network outside the local system. Software Locks and Latches refers to other types of software delays. These can include file locks, contention for data base buffers, messaging delays for local cooperative processing, artificial locks such as for local mail boxes, and other unique delays that do not have to do with the physical resources of the system. IMPLEMENTATION METHODS There are five basic implementation methods to meet the business need in a mainframe alternative situation. They are TRANSFER, CONVERT, REPLACE, REWRITE, and SURROUND. TRANSFER means move the same application from the mainframe to the alternative platform. Many financial and other application packages run on several platforms and can be transferred with a minimum amount of effort. CASE tools, 4GLs, Executive Information Systems, and other types of application software can be ported. Examples of these are Lawson Financials, FOCUS, and SAS. The advantages of TRANSFER include easier transition for program maintenance and little or no training for end users. REPLACEing the current application with an "off-the-shelf" package offers many advantages. A new package may have a better feature set than a ten year old package or a user developed package. A package specifically designed and tuned for the open environment utilizing client/server concepts will likely offer better response and throughput than existing systems. End users will need to use the new package of course. It is likely that maintenance will be easier because of the lack of "spaghetti" code and newer available tools. The best example of the CONVERT strategy is to convert CICS/COBOL applications to run in the open environment. This could be an emulation, a coexistence strategy or a movement to C language with a new GUI. There are advantages and disadvantages in this strategy from a performance perspective. Some cases have shown performance improvements, others degradation. The largest advantage is that there are utilities and services that aid in this conversion rather than having to do a total rewrite of the applications. Automated conversions can be performed by third parties with tools and methodologies developed precisely for this purpose. With this strategy it is also possible to change to a new data base or file structure, to a new user interface, and even to a new language if desirable. For example, customers have converted from DB2 with COBOL and CICS to VPLUS with TurboImage under MPEiX or to COBOL with CURSES and Oracle under HP-UX. Similar to the TRANSFER, the end users would not need retraining and the programmers would be familiar with the source code since it is essentially the same. There may have to be training on the new data base or user interface. A REWRITE may have the advantage of allowing an improvement in the process or conversion to client/server or adoption of a new data base or file system. There may be a performance edge over some of the other possible implementation paths. With the current Rapid Application Development tools this may be a desirable solution for off-load application targets in particular. End users will have to learn the new system but on the other hand could contribute to a better design. Clearly there would be a relatively massive programming and design effort involved. The expense associated with the REWRITE option is generally high as a result. The SURROUND strategy is one where the mainframe can sit in the middle of a network of open systems as a server. The mainframe could have a corporate data base so cumbersome that conversion is too complex. With the current newer Middleware tool sets it is possible to use open systems as a client to a large mainframe data base while using the Structured Query Language (SQL) tools available in the open environment. There also may be certain applications that are difficult to migrate that require the mainframe environment. In the purest sense the SURROUND strategy means coexistence with the mainframe while using the enabling tools on the open systems around it. The main advantage to this solution is that the investment in the data base design is protected. Other advantages include the use of SQL type tools and the ability to use less expensive systems for the clients to the mainframe server. The mainframe would be off-loaded because the application would be on the client systems. This could prolong the life of the mainframe and provide cost avoidance for mainframe upgrades. These implementation strategies have performance advantages and disadvantages in different situations. Deciding whether the solution selected is viable may depend on judgment and knowledge of the performance situations discussed in the sections to follow. APPLICATION SELECTION A mainframe alternative may involve off-loading one application or moving the entire suite of applications. Often it is desirable to select one application first. The methodology for selecting this first application varies depending on the needs of the customer. Inherent in any selection must be the acceptability of the resultant response or throughput. Some criteria for application selection are: - Query only - Decision Support - 3 to 6 month development cycle - Less than 80,000 transactions per day - Relatively small disk files - Packaged applications - Mission critical application - Feasible project The most important thing to consider when picking an application is to be sure the application selected would not be "just a test." If the organization is just "kicking the tires" there will not be enough of an interest or commitment to see the project through to completion. Select one that must succeed. CPU SIZING There is no technique for accurately sizing a replacement CPU with a different architecture. The traditional methods use industry benchmarks, marketing information, number of users, number of transactions per hour, MIPS rating, I/O rates, and many other types of metrics that can be used to estimate which CPU can best fit the needs of the application users. Analytic Modeling has been used to study feasibility of migrating and providing suitable performance. If information on CPU per transaction, disk I/O per transaction, average think time, and number of transactions per hour is known, analytic modeling tools can be applied for interactive applications. For batch type modeling it is necessary to know the CPU and disk I/O per job and number of jobs per hour. Inaccuracy is introduced by not knowing the relationships between the mainframe environment and the new environment. Differences in disk technology and file system access technology must be considered also. Operating system differences such as internal buffering mechanisms can have a large impact on this type of modeling. It is necessary to create a ratio of main frame CPU (MFCPU) second to alternate CPU (ALCPU) second to migrate within the model. This is done by using ratios of known performance benchmarks with similar applications or other criteria to estimate the relationship. For example if it is known that MFCPU is rated at 50 TPC-A transactions per second and the ALCPU is rated at 60 TPC-A transactions per second, it is logical to use that ratio for on-line type transactions. It is logical but it may not be accurate. In this type of scenario, analytic modeling is often used as a sanity check after other CPU sizing efforts using more traditional methods such as comparing similar installed applications at known sites are completed. Other methods are to examine how many users are most customer systems supporting doing similar transactions. It is impossible for any of these methods to be correct 100% of the time. Most analysts responsible for sizing systems in main frame alternative scenarios do it based on a combination of these methods and mixing in a great deal of experience to finally come up with "the answer." Generally a conservative approach is used and if there is any doubt about which CPU is most likely to succeed, the faster CPU will be selected. It is too simplistic to look at a competitive information chart from one vendor or from an independent third party and say that since the ALCPU is the same speed as the MFCPU it can be a replacement with the same performance. For a first estimate this might be acceptable, but more study is needed. Fortunately it is common for MFCPU systems to have performance reports generated. Unfortunately they are not always accurate. Another consideration is that if only one application is moving from the main frame, it will be necessary to size based on the part of the main frame that application is using. The trap here is that if the application is using 35% of the MFCPU at a peak time it would be easy to assume that an ALCPU could be used that is 35% the speed of the MFCPU. This probably would not be true. This is the advantage of analytic modeling. The differences in CPU speed per transaction can be considered and the differences in disk technology also can be part of the model. Queues created by waiting for the various resources can be predicted and a resultant response time can be estimated. In a similar fashion, batch throughput can be estimated. If a batch job takes 8 hours on the mainframe and the ALCPU is 10% faster it could be estimated that the batch job would run 10% faster. This might be approximately correct, but it is more complex than that. The CPU component of the 8 hours might only be 2 hours and the remainder could be attributed to disk I/O. After migration to the ALCPU, the disk I/O might be 6.5 hours and the CPU 1.8 hours. The total job therefore would now take 8.3 hours even though the CPU is faster. Again this is where analytic modeling may be useful. Even with analytic modeling tools and services (HPCAPLAN), it is often required to benchmark or do some sort of pilot. Benchmarking is expensive and perhaps only gives an estimate of actual production results. A pilot may reveal unexpected technical issues and is of value as a proof of concept. Often a pilot or small benchmark can be accurately measured and used as a basis for sizing other application migrations. Modeling is a more flexible, less expensive way to make a good business decision on system sizing than benchmarking in most cases. Benchmarking is often more accurate if performed properly however. A good performance consultant can recommend the best solution for the mainframe alternative situation being considered. DISK CONSIDERATIONS While there are many disk considerations in mainframe alternatives such as RAID, file space, and mirroring, from a performance point of view the most important factors are the amount of I/O required to do transactions in the alternative environment and the length of time it takes to do an I/O. Similar to our discussion about CPU alternatives, the alternative platform generally will have disk capacities smaller on average than the typical mainframe with access rates similar, but probably slower. There are exceptions to this guideline however. This means that the disk I/O in an alternative environment may be slightly slower. To offset this handicap, the alternative environments may have better access methods to reduce the amount of physical I/O needed to do the same work. The net of this is that the alternative environment is slightly faster in some cases and slightly slower in others. The major disk performance concern is generally the I/O system inherent in the data base selected. That will be discussed in the data base section. There are often operating system requirements for disk space such as a certain percentage of free disk space for optimum performance and adequate temporary and sort space availability. In sizing an alternative solution it is critical to allow for sufficient sort space in particular. Another area that is often overlooked is limitations to growth in the areas of file systems, total disk space in a single system, and limitations in the data base caused by structural situations There may be a limitation to table size in the alternative relational data base that did not exist in the main frame data base. Analytic modeling techniques can take this type of knowledge on disk I/O situations and predict both on-line and batch performance as discussed in the CPU section previously. As with CPU, the analysts will take all known factors, mix in any information on similar installed customer systems, and a smattering of experience to come up with a proper disk configuration to allow for acceptable performance. A conservative approach is generally used with disk I/O as well. Channel and controller speed also can affect I/O access rate when a system is busy enough. This is usually only indirectly considered in analytic modeling but should be considered by the analyst if the I/O rates are high enough. Disk performance usually starts with, "Will it fit on the spindles?" "Will it fit in the file system?" "Will it fit in the data base?" Sometimes after those questions are addressed, as with CPU, a pilot or benchmark may be needed to see what will really happen with disk I/O. The simplest scenario to predict is the TRANSFER because of the large number of potential changes in the other implementation methodologies. The SURROUND strategy has a very interesting disk perspective. There are two basic scenarios -- the mainframe contains all of the data and the main frame is a central repository server with distributed data bases on other servers linked in to the corporate data base. The idea is to let the mainframe be the data base I/O engine but let the users use tools on the client systems that give them the accesses they need. From a performance perspective, we can say we are using less expensive distributed MIPS on the alternative systems while maintaining the corporate data base intact and thus avoiding potential upgrades or poor response on the main frame. This strategy is very important in the very large environments and is a way of improving performance and price/performance while the distributed data base environment is evolving. Part of the SURROUND plan is to realize that long term even the main frame data base will be distributed to alternative networked systems. In this client/server type of arrangement, the Middleware and the network now become components of the response and should be considered when trying to predict migrated response and throughput. Middleware is software that allows client/server applications to be easily developed and supported. NETWORKING ISSUES There are many networking issues with mainframe alternatives. From a performance perspective, we must be sure that the network is adequate to handle the new client/server traffic quickly enough. If there is to be network backup and recovery of files and transactions, the network components need to be able to handle that. If client/server is part of the direction or application mix, the performance implications of the two tiered or three tiered approaches will need to be considered. Connectivity and functionality issues abound, but are beyond the scope of this paper. (See the April 26, 1993, issue of COMPUTERWORLD for my article on client/server network performance issues titled "Network Jam".) Clearly file transfer and applications such as EDI can have a serious impact on system performance and business success. Proper system sizing will need to take these often overlooked system loads into account. Another possible area for consideration is the concept of overhead and recovery time from two-phase commits from distributed data bases. In the simplest sense, client-server distributed data bases require a two- phase commit for the transaction to complete and this implies overhead on both the client and the server as well as possible network delays. This becomes even more complex when a three tiered approach is used or when a data base is distributed throughout the network of servers requiring multiple two-phase commits and extremely complex staging of transactions and recoveries when a server is down. These complex recoveries can cause major temporary performance degradation. It is highly recommended that a network/performance monitor be part of the mainframe alternative solution set. This will help identify and size particular parts of the network that may be slowing data transfers either through high error rates or from too small a pathway for the demand. Network planning and design are critical to the successful implementation of a mainframe alternative solution. The important thing is to be sure that there is enough band-width for all connections to support the data to be transferred in a rapid enough fashion. OPERATIONS & SYSTEM MANAGEMENT Performance monitoring, system tuning, and reporting of service levels are important components of a mainframe alternative solution. Systems need to be monitored for sufficient memory, CPU, and Disk accessibility. Items such as system tables and buffering also need to be monitored and reconfigured as necessary. Often these exercises lead to suggested improvements in the data base or application code. Software performance engineering techniques are highly recommended if a REWRITE implementation is indicated. Exceptional performance can be designed into the code and the data base rather than attempting to retrofit it later at a higher cost and lower success rate. These techniques can be applied to other implementation methodologies, but the best investment is in new designs. Consulting may be needed to establish metrics for service level agreements or to implement performance tools properly in the new environment. Without performance tools, the customer is "driving his system in the dark." Reporting mechanisms need to be established as well. These tend to be standard in the mainframe world, but are often overlooked in the open environments. Capacity planning or other types of long and short range business consulting are also recommended. Backup and recovery strategies are also important and are sometimes considered performance issues. If the backup strategy selected is too time consuming, operations will be more expensive and may interfere with production work because of the infamous race to daylight. Is a lights out environment desirable? What about historical or hierarchical storage on optical disks? What about disk mirroring or SPU Switchover to reduce the probability and duration of downtime? These types of facilities are all available in the alternate environments in a different way than they are in the main frame shop. There are data integrity issues that have an impact on performance. For example if there is to be some type of data base logging, there is certainly overhead in both the CPU and disk I/O areas of the system. This overhead can be as high as 10% extra on what every user does. This factor should be considered in the CPU and Disk configurations as discussed earlier. DATA BASES If a data base is transferred, there are two situations which impact performance. Is the data base actually backed up and moved? Or is the data base unloaded and reloaded? If the data base is moved without the unload, chances are that performance will be worse than expected because the data base cannot be tuned for the new environment. If the data is unloaded and loaded into a newly created data base, the performance may be better than expected because of the potential improved design of the data base on the new platform. If there is a move to a new data base as part of the transition, the performance may or may not be similar. There is a higher chance of accurate prediction if both data bases are relational SQL type data bases for example. If one is networked and one relational, it is difficult to predict performance of an application because of the change in access path to the data. Selection of a new data base also may be a performance issue. There have been various types of benchmarks done comparing different data bases and even different releases of the same data bases on various hardware platforms. It is important to consider all known information as was discussed in the earlier CPU and Disk sections before selecting a data base based on performance. Current data bases are differentiated more by feature sets than by performance. The recommended overall strategy should be to select an open data base that performs well and has all the features needed. If there will be a client/server type of access be sure to consider the features and performance implications of that. It also may be wise to review the features and performance factors involved with the distributed data base features of the data bases in making a selection. Relational data bases have many features and capabilities that can and should be taken advantage of when possible. Some of these items are ability to use raw partitioning, concurrence separation (indexes from tables), striping, spreading data across many less expensive drives and controllers, isolating transaction log files from other files and controllers, and configuring large memory and buffering areas. Each relational data base allows a different subset of these features. A very important decision to be made is to choose to repair or improve deficiencies in the current data base structure as part of the migration or conversion. An example of this is a customer with a five year old relational data base. As the data base evolved and needs changed, the staff did not take the time to alter the data base table structure to meet those needs the most efficient way. Instead the expedient way was found to add new tables, creating redundant data structures and wasting I/O and CPU as a result. As part of this customer's conversion to a new relational data base, the integrator redesigned the tables and the 4th Generation Language access routines to repair the problem. This allowed the performance that resulted to exceed expectations. It may be very wise to purchase some data base design and tuning consulting from the data base vendor as part of the implementation plan and to allow time for those functions in the project. THIRD PARTY APPLICATIONS Like the CPU and Disk discussions earlier, if a third party application is selected to replace an existing one, examine whatever data is available including vendor information and benchmarks as well as any customer reference sites for information on expected response. Data base used also will be a part of this decision process. If an application is to be transferred, try to be sure it is a version optimized for the new platform rather than a port of the source code. If you have to choose between one that is ported or one that is designed and written on the new alternative system, pick the native version. There is a high probability the performance will be better. Be conservative in your expectations. However, it has been my experience that the newer versions of these applications tend to run better than the old main frame versions. CLIENT/SERVER If possible, move to client/server applications as part of the conversion effort. Often this is a REWRITE, but it also may be a REPLACE strategy. The performance implications of client/server are that cheaper MIPS will be doing the work with similar response in most cases as long as the network is adequate. There are also all the other benefits associated with client/server technology such as the better tools and lower maintenance. With a move to client/server, the expensive MIPS will be saved to be used for other things. In the SURROUND situation, the main frame life without upgrade may be extended. In the REWRITE scenario, the new server will have a similar benefit and may be able to be a smaller CPU than was originally expected or sized based on a main frame CPU to alternate CPU comparison. It is very common to suggest a move to client/server as phase II of a migration project. This sounds like a good strategy, but in real practice it seldom gets completed. Phase I is of course to get as much off the main frame as possible to save money and gain the other benefits discussed. It is wise to size the system for phase I and not for phase II that may or may not be in the future. CONSULTING In addition to the network, data base, and performance consulting mentioned earlier, migration planning and project management consulting may be necessary to implement the project in a timely fashion and to be sure that the performance needs and milestones are not overlooked. Integration and conversion consulting also may be needed. Be sure that all consulting is scheduled in the project and meshes with the requirements definition. SUMMARY There are many performance considerations in planning a main frame alternative solution. This paper has attempted to discuss some of them at a high enough level to be applicable to a variety of situations. In whatever situation develops, some or all of these topics may come into play. The analyst or consultant must gather all the information available and proceed with the best business decision possible. It is critical to the successful completion of a main frame alternative project that consideration be given beforehand so that surprises are minimized and that performance service levels are met. Rightsizing Your Mainframe Performance Criteria 7032 - {PAGE|1}